0

I have a large dataset of 20 cities and I'd like to split it into smaller ones for each city. Each variable in the dataset will be exported into a text file.

foreach i in Denver Blacksburg {
use "D:\Data\All\AggregatedCount.dta", clear

drop if MetroArea != `i'

export delimited lnbike using "D:\Data/`"`i'"'/DV/lnbike.txt", delimiter(tab) replace
export delimited lnped using "D:\Data/`"`i'"'/DV/lnped.txt", delimiter(tab) replace 
}

I tried i' and"`i'"' in the export commands but none of them worked. The error is

"Denver not found."

I also have cities that have space in between, such as Los Angeles. I tried

local city `" "Blacksburg" "Los Angeles" "Denver" "'
foreach i of city {
use "D:\Data\All\AggregatedCount.dta", clear

drop if MetroArea != `i'

export delimited lnbike using "D:/Data/`"`i'"'/DV/lnbike.txt", delimiter(tab) replace
export delimited lnped using "D:/Data/`"`i'"'/DV/lnped.txt", delimiter(tab) replace 
}

This didn't work either. Do you have any suggestion?

3
  • Thanks @Hack-R. Do you have any suggestion to solve this in R? Commented Sep 4, 2017 at 2:19
  • 1
    Eric HB helped me with Stata in his answer below, and I'm also interested in knowing the solution in R as well. Commented Sep 4, 2017 at 4:27
  • Sure, sorry I'm just getting back to my laptop now. So, you want to save a separate file for every level of a factor variable (every city) right? for(l in levels(iris$Species){data.table::fwrite(iris[iris$Species==l,],paste0(l,".csv"))} You could also use write.csv() instead of fwrite so that you don't need a library, but fwrite is faster. Commented Sep 4, 2017 at 14:31

2 Answers 2

2

If you want to continue with Stata, the only thing you would need to change in your first code snippet is

`"`i'"'

to

\`i'

Note the \ so that your code looks like:

export delimited lnbike using "D:\Data\\`i'/DV/lnbike.txt", delimiter(tab) replace

(I would personally change all of the forward slashes (/) to back slashes (\) in general anyway) but the extra one is because a backslash before a left single quote in a string evaluates to just the left single quote. Having the second backslash tells Stata that you want the local macro i to be evaluated.

Your second code snippet could work if you also changed

foreach i of city {

to

foreach i of `city' {

It might be helpful to read up on local macros: they can definitely be confusing, but are powerful if you know how to use them.

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks Eric HB! I also tried drop if MetroArea != "`i'"' ' and it worked. Good to know about the double backward slashes!
0

This answer overlaps with the helpful answer by @Eric HB.

Given 20 (or more) cities you should not want to type those city names, which is tedious and error-prone, and not needed. Nor do you need to read in the dataset again and again, because you can just export the part you want. This should get you closer.

use "D:/Data/All/AggregatedCount.dta", clear

* result is integers 1 up, with names as value labels
egen which = group(MetroArea), label 
* how many cities: r(max), the maximum, is the number  
su which, meanonly 

forval i = 1/`r(max)' { 
     * look up city name for informative filename  
     local where : label (which) `i' 
     export delimited lnbike if which == `i' using "D:/Data/`where'/DV/lnbike.txt", delimiter(tab) replace
     export delimited lnped if which == `i' using "D:/Data/`where'/DV/lnped.txt", delimiter(tab) replace 
}

The principles concerned not yet discussed:

-- When testing for literal strings, you need " " or compound double quotes to delimit such strings. Otherwise Stata thinks you mean a variable or scalar name. This was your first bug, as given

drop if MetroArea != `i' 

interpreted as

drop if MetroArea != Denver 

Stata can't find a variable Denver. As you found, you need

drop if MetroArea != "`i'" 

-- Windows uses the backslash as a separator in file and directory names, but Stata also uses the backslash as an escape character. If you use local macro names after such file separators, the result can be quite wrong. This is documented at [U] 18.3.11 in this manual chapter and also in this note. Forward slashes are never a problem, and Stata understands them as you intend, even with Windows.

All that said, it is difficult to believe that you will be better off with lots of little files, but that depends on what you want to do with them.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.